In this part, the deep learning approach is used to classify the image whether the presented image is beach, forest, or mountain. Usually, deep learning approach can be implemented in image classificatio, image detection , facial recognition, and so on.
# Data wrangling
library(tidyverse)
# Image manipulation
library(imager)
# Deep learning
# devtools::install_github("rstudio/keras")
library(keras)
# install_keras(method = c("conda"))
library(tensorflow)
# Model Evaluation
library(caret)
options(scipen = 999)You need to install the pillow package in your conda
environment to manipulate image data. Here is the short instruction on
how to create a new conda environment with tensorflow and
pillow inside it.
1.Open the terminal, either in anaconda command prompt or directly in RStudio.
2.Create new conda environment by running the following
command.
conda create -n r-tensorflow python=3.9
3.Active the conda environment by running the following
command. conda activate r-tensorflow
4.Install the tensorflow package into the
environment.
conda install -c conda-forge tensorflow=2.12.0 or
conda install tensorflow=2.12.0
5.Install the pillow package.
conda install pillow
In image classification problem, it is a common practice to put each
image on separate folders based on the target class/labels. For example,
inside the train folder in our data, you can that we have 3 different
folders, respectively for beach, forest, or
mountain.
Let’s try to get the file name of each image. First, find the folder of each target class. Then, the code below helps to get the folder name inside the train folder.
#> [1] "beach" "forest" "mountain"
Combining the folder name with the path or directory of the train folder in order to access the content inside each folder.
#> [1] "data/train/beach/" "data/train/forest/" "data/train/mountain/"
Using the map() function to loop or iterate and collect
the file name for each folder (beach, forest, mountain). The
map() will return a list, so using unlist()
function to combine the file name from 3 different folders.
# Get file name
file_name <- map(folder_path,
function(x) paste0(x, list.files(x))
) %>%
unlist()
# first 6 file name
head(file_name)#> [1] "data/train/beach/beach_100.jpeg" "data/train/beach/beach_101.jpeg"
#> [3] "data/train/beach/beach_102.jpeg" "data/train/beach/beach_103.jpeg"
#> [5] "data/train/beach/beach_104.jpeg" "data/train/beach/beach_105.jpeg"
Check the last 6 images by using tail function.
#> [1] "data/train/mountain/mountain_a_45.jpeg"
#> [2] "data/train/mountain/mountian_115.jpeg"
#> [3] "data/train/mountain/mountian_45.jpeg"
#> [4] "data/train/mountain/mountian_72.jpeg"
#> [5] "data/train/mountain/mountian_74.jpeg"
#> [6] "data/train/mountain/mountian_95.jpeg"
Let’s check how many images in file_name folder.
#> [1] 1328
To check the content of the file, we can use the Using
load.image() function from the imager package,
to check the content of the file. Let’s randomly visualize 6 images from
the data.
# Randomly select image
set.seed(91)
sample_image <- sample(file_name, 6)
# Load image into R
img <- map(sample_image, load.image)
# Plot image
par(mfrow = c(2, 3)) # Create 2 x 3 image grid
invisible(map(img, plot))In this part, the important of image classification is understand the distribution of the image dimension to create a proper input dimension for building the deep learning model.
#> Image. Width: 279 pix Height: 181 pix Depth: 1 Colour channels: 3
The result above shows the dimension of the image. The height and
width represent the height and width of the image in pixels. The color
channel represent if the color is in gray scale format (color channels =
1) or is in RGB format (color channels = 3). Using the
dim() function to get value of each dimension.
#> [1] 279 181 1 3
The result above shows that the height score = 279, width score = 181, depth score = 1, and the channels = 3 (RGB).
Creating a function that will instantly get the height and width of
an image and convert it into a data.frame.
# Function for acquiring width and height of an image
get_dim <- function(x){
img <- load.image(x)
df_img <- data.frame(height = height(img),
width = width(img),
filename = x
)
return(df_img)
}
# Implementation
file_dim <- map_df(file_name, get_dim)
file_dimThe result above shows the converted result from the images’ height and width into the data frame where each images has different size of height and size of width.
#> height width filename
#> Min. : 94.0 Min. :100.0 Length:1328
#> 1st Qu.:168.0 1st Qu.:268.0 Class :character
#> Median :183.0 Median :275.0 Mode :character
#> Mean :178.2 Mean :282.9
#> 3rd Qu.:184.0 3rd Qu.:300.0
#> Max. :314.0 Max. :534.0
After conducting summary function, it can be concluded
that the quality of image is medium to low and the 50% of datas are
located in height (168 - 184 pxl) and width (268 - 300 pxl). And, the
maximum of image height is 314 pxl and the minimum of image height is 94
pxl. Then, the maximum of image width is 534 pxl and the minimum of
image height is 100 pxl.
Determining the input image dimension size for the deep learning
model which based on summary(file_dim). All input images
should have the same dimensions. Then, transforming all image into 128 X
128 pixels dimension size. The more higher image dimension size, it
takes longer time to train the data. But, if the image size is too
small, a lot of information will lost. Furthermore, setting the batch
size with 24 for the data so the model will be updated every time it
finished training.
# Desired height and width of images
target_size <- c(128,128)
# Batch size for training the model
batch_size <- 15 Due to the amount of data train set is small, then using the
image_data_generator function from keras
package for Image Augmentation to leverage the amount of
data train set without acquiring new images. In data augmentation the
image will be modified (e.g. flip, rotate, zoom, crop, de-texturized,
de-colorized, edge-enhanced, salient edge map, etc.) so that the model
can learn very well.
Creating the image generator for keras with the following properties:
You can explore more features about the image generator on this link.
# Image Generator
train_data_gen <-
image_data_generator(rescale = 1/255,
horizontal_flip = T,
width_shift_range = 0.2,
height_shift_range = 0.2,
zoom_range = 0.2,
brightness_range = c(1,2),
fill_mode = "nearest",
validation_split = 0.2)The generator is applied by using the
flow_images_from_directory(). The data is located inside
the data folder and inside the train folder,
so the directory will be data/train. From this process,
getting the augmented image both for training data and the validation
data through splitting 80% for train data and 20% for test/validation
data. The train data set will be used to train and evaluate the model,
while the test data set is used for the final evaluation.
# Training Dataset
train_image_array_gen <-
flow_images_from_directory(
directory = "data/train/",
# Folder of the data
target_size = target_size,
# target of the image dimension (64 x 64)
color_mode = "rgb",
# use RGB color
batch_size = batch_size ,
seed = 123,
# set random seed
subset = "training",
# declare that this is for training data
generator = train_data_gen
)
# Validation Dataset
val_image_array_gen <-
flow_images_from_directory(
directory = "data/train/",
target_size = target_size,
color_mode = "rgb",
batch_size = batch_size ,
seed = 123,
subset = "validation",
# declare that this is the validation data
generator = train_data_gen
)Checking the class proportion of the train data set. The index correspond to each labels of the target variable and ordered alphabetically (beach, forest, mountain).
# Number of training samples
train_sample <- train_image_array_gen$n
# Number of validation samples
valid_sample <- val_image_array_gen$n
# Number of target classes/categories
output_n <- n_distinct(train_image_array_gen$classes)
# Get the class proportion
# train
table("\nFrequency" = factor(train_image_array_gen$classes)
) %>%
prop.table()#>
#> Frequency
#> 0 1 2
#> 0.3214286 0.3045113 0.3740602
#>
#> Frequency
#> 0 1 2
#> 0.3219697 0.3030303 0.3750000
After checking the image class proportion, it can be concluded that the image class proportion is balanced on the train data set and test data set.
The Convolutional Neural Network (CNN) or Convolutional Layer is a
popular layer for image classification. The image contains 2 dimensional
array which are height and width also 3 channels of RGB (Red, Green,
Blue) information. And the computer can understand / process when the
image has already convert to pixels and change the architecture of a
neural network in a way to take advantage of this structure.
CNNs transform the input data from the input layer through all connected
layers into a set of class scores given by the output layer. The
feature-extraction layers have a general repeating pattern of the
sequence:
1. Convolution layer
Express the Rectified Linear Unit (ReLU) activation function as a layer in the diagram here to match up to other literature.
2. Pooling layer
These layers find a number of features in the images and progressively construct higher-order features. This corresponds directly to the ongoing theme in deep learning by which features are automatically learned as opposed to traditionally hand engineered.
Finally we have the classification layers in which we have one or more fully connected layers to take the higher-order features and produce class probabilities or scores. These layers are fully connected to all of the neurons in the previous layer, as their name implies. The output of these layers produces typically a two-dimensional output of the dimensions [b × N], where b is the number of examples in the mini-batch and N is the number of classes we’re interested in scoring.
Let’s build a simple model first with the following layer:
relu activation function.softmax activation
function.Don’t forget to set the input size in the first layer. If the input image is in RGB, set the final number to 3, which is the number of color channels. If the input image is in grayscale, set the final number to 1.
#> [1] 128 128 3
# Set Initial Random Weight
# tensorflow::tf$random$set_seed(121)
tensorflow::set_random_seed(105)
model <- keras_model_sequential(name = "simple_model") %>%
# Convolution Layer
layer_conv_2d(filters = 32,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = c(target_size, 3)
) %>%
# Max Pooling Layer
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# Flattening Layer
layer_flatten() %>%
# Dense Layer
layer_dense(units = 1000,
activation = "relu") %>%
# Output Layer
layer_dense(units = output_n,
activation = "softmax",
name = "Output")
model#> Model: "simple_model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> conv2d (Conv2D) (None, 128, 128, 32) 896
#> max_pooling2d (MaxPooling2D) (None, 64, 64, 32) 0
#> flatten (Flatten) (None, 131072) 0
#> dense (Dense) (None, 1000) 131073000
#> Output (Dense) (None, 3) 3003
#> ================================================================================
#> Total params: 131,076,899
#> Trainable params: 131,076,899
#> Non-trainable params: 0
#> ________________________________________________________________________________
After compiling the model by specifying the loss function and optimizer, then the model is fitted by validation data after the model is fitted by train data. In this step, using 40 epochs, then 0.001 learning rate, and categorical cross-entropy as the loss function also using sgd optimizer.
model %>%
compile(
loss = "categorical_crossentropy",
optimizer = optimizer_sgd(learning_rate = 0.001),
metrics = "accuracy"
)
# Fit data into model
history <- model %>%
fit_generator(
# training data
train_image_array_gen,
# training epochs
steps_per_epoch = as.integer(train_sample / batch_size),
epochs = 40,
# validation data
validation_data = val_image_array_gen,
validation_steps = as.integer(valid_sample / batch_size))
plot(history)Before the model is evaluated with confusion matrix, It is required to get the file name of the image that is used as the data validation by extracting the categorical label as the actual value of the target variable.
val_data_untuned <- data.frame(file_name = paste0("data/train/", val_image_array_gen$filenames)) %>%
mutate(class = str_extract(file_name, "beach|forest|mountain"))
head(val_data_untuned, 100)Obtaining the image by converting it into an array, due to the image has 2 dimensions and 3 color channels (RGB). Then, using the model to predict the original image from the folder but doesn’t use the image generator since it will transform the image and does not reflect the actual image.
# Function to convert image to array
image_prep <- function(x) {
arrays <- lapply(x, function(path) {
img <- image_load(path, target_size = target_size,
grayscale = F # Set FALSE if image is RGB
)
x <- image_to_array(img)
x <- array_reshape(x, c(1, dim(x)))
x <- x/255 # rescale image pixel
})
do.call(abind::abind, c(arrays, list(along = 1)))
}#> [1] 264 128 128 3
pred_test_untuned <- predict(model, test_x_untuned) %>%
k_argmax() %>% # for taking the highest probability
as.array() %>%
as.factor()
head(pred_test_untuned, 10)#> [1] 0 0 0 0 0 0 0 0 0 0
#> Levels: 0 1 2
Converting the encoding into class label for getting easier interpretation.
# Convert encoding to label
decode <- function(x){
case_when(x == 0 ~ "beach",
x == 1 ~ "forest",
x == 2 ~ "mountain"
)
}
pred_test_untuned <- sapply(pred_test_untuned, decode)
pred_test_untuned#> [1] "beach" "beach" "beach" "beach" "beach" "beach"
#> [7] "beach" "beach" "beach" "beach" "beach" "beach"
#> [13] "beach" "beach" "beach" "mountain" "beach" "beach"
#> [19] "beach" "mountain" "beach" "beach" "mountain" "beach"
#> [25] "beach" "beach" "beach" "mountain" "beach" "beach"
#> [31] "beach" "mountain" "mountain" "beach" "mountain" "beach"
#> [37] "beach" "beach" "beach" "beach" "beach" "mountain"
#> [43] "beach" "beach" "beach" "beach" "beach" "mountain"
#> [49] "beach" "beach" "beach" "beach" "mountain" "beach"
#> [55] "mountain" "beach" "beach" "mountain" "beach" "beach"
#> [61] "mountain" "beach" "beach" "mountain" "beach" "beach"
#> [67] "mountain" "beach" "mountain" "beach" "beach" "mountain"
#> [73] "mountain" "beach" "beach" "beach" "beach" "beach"
#> [79] "beach" "beach" "beach" "forest" "beach" "beach"
#> [85] "beach" "forest" "forest" "forest" "forest" "forest"
#> [91] "forest" "forest" "forest" "forest" "forest" "forest"
#> [97] "forest" "forest" "mountain" "forest" "forest" "forest"
#> [103] "forest" "forest" "forest" "mountain" "mountain" "forest"
#> [109] "forest" "forest" "forest" "forest" "forest" "mountain"
#> [115] "forest" "forest" "forest" "forest" "forest" "forest"
#> [121] "forest" "mountain" "forest" "forest" "forest" "forest"
#> [127] "forest" "forest" "forest" "forest" "mountain" "forest"
#> [133] "forest" "forest" "forest" "forest" "forest" "forest"
#> [139] "forest" "forest" "forest" "forest" "forest" "mountain"
#> [145] "forest" "forest" "forest" "forest" "forest" "forest"
#> [151] "forest" "forest" "forest" "forest" "forest" "forest"
#> [157] "forest" "forest" "mountain" "forest" "forest" "mountain"
#> [163] "forest" "forest" "forest" "mountain" "mountain" "forest"
#> [169] "mountain" "mountain" "mountain" "beach" "mountain" "mountain"
#> [175] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [181] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [187] "mountain" "forest" "mountain" "mountain" "mountain" "mountain"
#> [193] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [199] "mountain" "mountain" "mountain" "mountain" "forest" "mountain"
#> [205] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [211] "forest" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [217] "mountain" "forest" "mountain" "mountain" "mountain" "forest"
#> [223] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [229] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [235] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [241] "beach" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [247] "mountain" "mountain" "mountain" "forest" "mountain" "beach"
#> [253] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [259] "mountain" "mountain" "mountain" "mountain" "beach" "mountain"
Evaluating the model using the confusion matrix.
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction beach forest mountain
#> beach 66 0 4
#> forest 1 71 7
#> mountain 18 9 88
#>
#> Overall Statistics
#>
#> Accuracy : 0.8523
#> 95% CI : (0.8036, 0.8928)
#> No Information Rate : 0.375
#> P-Value [Acc > NIR] : < 0.0000000000000002
#>
#> Kappa : 0.7764
#>
#> Mcnemar's Test P-Value : 0.01726
#>
#> Statistics by Class:
#>
#> Class: beach Class: forest Class: mountain
#> Sensitivity 0.7765 0.8875 0.8889
#> Specificity 0.9777 0.9565 0.8364
#> Pos Pred Value 0.9429 0.8987 0.7652
#> Neg Pred Value 0.9021 0.9514 0.9262
#> Prevalence 0.3220 0.3030 0.3750
#> Detection Rate 0.2500 0.2689 0.3333
#> Detection Prevalence 0.2652 0.2992 0.4356
#> Balanced Accuracy 0.8771 0.9220 0.8626
#> Model: "simple_model"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> conv2d (Conv2D) (None, 128, 128, 32) 896
#> max_pooling2d (MaxPooling2D) (None, 64, 64, 32) 0
#> flatten (Flatten) (None, 131072) 0
#> dense (Dense) (None, 1000) 131073000
#> Output (Dense) (None, 3) 3003
#> ================================================================================
#> Total params: 131,076,899
#> Trainable params: 131,076,899
#> Non-trainable params: 0
#> ________________________________________________________________________________
Let’s recall the untuned model architecture. Next, the first CNN only extract the image features, then being downsampled by using max_poling2D layer. But the output still has 64 X 64 array, so that many informations don’t extract yet before flattening the data. Therefore, adding some layers into model for capturing more informations/features of image.
In this tuning stage, the amount of batch size is modified into 8, and target size is modified into 128 x 128 pxl.
# Desired height and width of images
target_size_tuned <- c(128,128)
# Batch size for training the model
batch_size_tuned <- 8# Image Generator
train_data_gen_tuned <- image_data_generator(rescale = 1/255,
horizontal_flip = T,
width_shift_range = 0.2,
height_shift_range = 0.2,
zoom_range = 0.2,
brightness_range = c(1,2),
fill_mode = "nearest",
validation_split = 0.2)The generator is applied by using the
flow_images_from_directory(). The data is located inside
the data folder and inside the train folder,
so the directory will be data/train. From this process,
getting the augmented image both for training data and the validation
data through splitting 80% for train data and 20% for test/validation
data. The train data set will be used to train and evaluate the model,
while the test data set is used for the final evaluation.
# Training Dataset
train_image_array_gen_tuned <- flow_images_from_directory(directory = "data/train/", # Folder of the data
target_size = target_size_tuned, # target of the image dimension (64 x 64)
color_mode = "rgb", # use RGB color
batch_size = batch_size_tuned ,
seed = 143, # set random seed
subset = "training", # declare that this is for training data
generator = train_data_gen_tuned
)
# Validation Dataset
val_image_array_gen_tuned <- flow_images_from_directory(directory = "data/train/",
target_size = target_size_tuned,
color_mode = "rgb",
batch_size = batch_size_tuned ,
seed = 143,
subset = "validation", # declare that this is the validation data
generator = train_data_gen_tuned
)# Number of training samples
train_sample_tuned <- train_image_array_gen_tuned$n
# Number of validation samples
valid_sample_tuned <- val_image_array_gen_tuned$n
# Number of target classes/categories
output_n_tuned <- n_distinct(train_image_array_gen_tuned$classes)
# Get the class proportion
# train
table("\nFrequency" = factor(train_image_array_gen_tuned$classes)
) %>%
prop.table()#>
#> Frequency
#> 0 1 2
#> 0.3214286 0.3045113 0.3740602
Let’s build a tuning model architecture as a model improved with the following layer:
Next, the input_shape has a number of 3, which indicates the image input is RGB. If the image input is grey scale, setting the last number of input_shape with a number of 1.
# Set Initial Random Weight
# tensorflow::tf$random$set_seed(121)
tensorflow::set_random_seed(107)
model_tuned <- keras_model_sequential(name = "model_tuned") %>%
# 1st Convolution Layer
layer_conv_2d(filters = 32,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = c(target_size_tuned, 3)
) %>%
# Max Pooling Layer 1
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# 2nd Convolution Layer
layer_conv_2d(filters = 64,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = c(target_size_tuned, 3)
) %>%
# Max Pooling Layer 2
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# 3rd Convolution Layer
layer_conv_2d(filters = 128,
kernel_size = c(3,3),
padding = "same",
activation = "relu",
input_shape = c(target_size_tuned, 3)
) %>%
# Max Pooling Layer 3
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# 4th Convolutional layer
layer_conv_2d(filters = 256,
kernel_size = c(3,3),
padding = "same",
activation = "relu"
) %>%
# Max pooling layer 4
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# 5th Convolutional layer
layer_conv_2d(filters = 256,
kernel_size = c(3,3),
padding = "same",
activation = "relu"
) %>%
# Max pooling layer 5
layer_max_pooling_2d(pool_size = c(2,2)) %>%
# Flattening Layer
layer_flatten() %>%
# Dense Layer 1
layer_dense(units = 128,
activation = "relu") %>%
# Output Layer
layer_dense(units = output_n_tuned,
activation = "softmax",
name = "Output")
model_tuned#> Model: "model_tuned"
#> ________________________________________________________________________________
#> Layer (type) Output Shape Param #
#> ================================================================================
#> conv2d_5 (Conv2D) (None, 128, 128, 32) 896
#> max_pooling2d_5 (MaxPooling2D) (None, 64, 64, 32) 0
#> conv2d_4 (Conv2D) (None, 64, 64, 64) 18496
#> max_pooling2d_4 (MaxPooling2D) (None, 32, 32, 64) 0
#> conv2d_3 (Conv2D) (None, 32, 32, 128) 73856
#> max_pooling2d_3 (MaxPooling2D) (None, 16, 16, 128) 0
#> conv2d_2 (Conv2D) (None, 16, 16, 256) 295168
#> max_pooling2d_2 (MaxPooling2D) (None, 8, 8, 256) 0
#> conv2d_1 (Conv2D) (None, 8, 8, 256) 590080
#> max_pooling2d_1 (MaxPooling2D) (None, 4, 4, 256) 0
#> flatten_1 (Flatten) (None, 4096) 0
#> dense_1 (Dense) (None, 128) 524416
#> Output (Dense) (None, 3) 387
#> ================================================================================
#> Total params: 1,503,299
#> Trainable params: 1,503,299
#> Non-trainable params: 0
#> ________________________________________________________________________________
In tuning model fitting, the learning rate is modified from 0.001 to 0.01 total epochs decreased to 35 out of 40. Then, using categorical crossentropy as loss function and using sgd optimizer.
model_tuned %>%
compile(
loss = "categorical_crossentropy",
optimizer = optimizer_sgd(learning_rate = 0.01),
metrics = "accuracy"
)
# Fit data into model
history_tuned <- model_tuned %>%
fit_generator(
# training data
train_image_array_gen_tuned,
# training epochs
steps_per_epoch = as.integer(train_sample_tuned / batch_size_tuned),
epochs = 35,
# validation data
validation_data = val_image_array_gen_tuned,
validation_steps = as.integer(valid_sample_tuned / batch_size_tuned))
# ,
# print progress but don't create graphic
# verbose = 1,
# view_metrics = 0)
plot(history_tuned)
The plot above shows that the model_tuned is good enough where the model
has the lost close to zero also small differences between accuracy on
train data and validation data. It means that the model is not underfit
or even overfit.
The model is evaluated through the confusion matrix using the validation data from the generator. First, the file name of the image is used as data validation. Then, the file name is extracted the categorical label as the target variable.
val_data <- data.frame(file_name = paste0("data/train/", val_image_array_gen_tuned$filenames)) %>%
mutate(class = str_extract(file_name, "beach|forest|mountain"))
head(val_data, 100)Obtaining the image by converting it into an array, due to the image has 2 dimensions and 3 color channels (RGB). Then, using the model to predict the original image from the folder but doesn’t use the image generator since it will transform the image and does not reflect the actual image.
# Function to convert image to array
image_prep_tuned <- function(x) {
arrays <- lapply(x, function(path) {
img <- image_load(path, target_size = target_size_tuned,
grayscale = F # Set FALSE if image is RGB
)
x <- image_to_array(img)
x <- array_reshape(x, c(1, dim(x)))
x <- x/255 # rescale image pixel
})
do.call(abind::abind, c(arrays, list(along = 1)))
}#> [1] 264 128 128 3
Furthermore, evaluating the data and acquire the confusion matrix for the validation data.
pred_test <- predict(model_tuned, test_x) %>%
k_argmax() %>% # xuntuk mengambil nilai probability paling besar
as.array() %>%
as.factor()
head(pred_test, 10)#> [1] 0 0 0 0 0 0 0 0 0 0
#> Levels: 0 1 2
Converting the encoding into class label for getting easier interpretation.
# Convert encoding to label
decode <- function(x){
case_when(x == 0 ~ "beach",
x == 1 ~ "forest",
x == 2 ~ "mountain"
)
}
pred_test <- sapply(pred_test, decode)
pred_test#> [1] "beach" "beach" "beach" "beach" "beach" "beach"
#> [7] "beach" "beach" "beach" "beach" "beach" "beach"
#> [13] "beach" "beach" "beach" "beach" "beach" "beach"
#> [19] "beach" "mountain" "beach" "beach" "beach" "beach"
#> [25] "beach" "beach" "beach" "beach" "beach" "beach"
#> [31] "beach" "mountain" "beach" "beach" "mountain" "beach"
#> [37] "beach" "beach" "beach" "beach" "beach" "beach"
#> [43] "beach" "beach" "beach" "beach" "beach" "beach"
#> [49] "beach" "beach" "beach" "beach" "mountain" "beach"
#> [55] "beach" "beach" "beach" "mountain" "beach" "beach"
#> [61] "beach" "beach" "beach" "beach" "beach" "beach"
#> [67] "beach" "beach" "beach" "beach" "beach" "beach"
#> [73] "beach" "beach" "beach" "beach" "beach" "beach"
#> [79] "beach" "beach" "beach" "beach" "beach" "beach"
#> [85] "beach" "forest" "forest" "forest" "forest" "forest"
#> [91] "forest" "forest" "forest" "forest" "forest" "forest"
#> [97] "forest" "forest" "forest" "forest" "forest" "forest"
#> [103] "forest" "forest" "forest" "forest" "mountain" "forest"
#> [109] "forest" "forest" "forest" "forest" "forest" "mountain"
#> [115] "forest" "forest" "forest" "forest" "forest" "forest"
#> [121] "forest" "forest" "forest" "forest" "forest" "forest"
#> [127] "forest" "forest" "forest" "forest" "forest" "forest"
#> [133] "forest" "forest" "forest" "forest" "forest" "forest"
#> [139] "forest" "forest" "forest" "forest" "forest" "mountain"
#> [145] "forest" "forest" "forest" "forest" "forest" "forest"
#> [151] "forest" "forest" "forest" "forest" "forest" "forest"
#> [157] "forest" "forest" "forest" "forest" "forest" "forest"
#> [163] "forest" "forest" "forest" "mountain" "mountain" "forest"
#> [169] "beach" "mountain" "beach" "beach" "mountain" "mountain"
#> [175] "mountain" "mountain" "mountain" "mountain" "beach" "mountain"
#> [181] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [187] "beach" "forest" "mountain" "mountain" "beach" "mountain"
#> [193] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [199] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [205] "mountain" "mountain" "mountain" "beach" "mountain" "mountain"
#> [211] "forest" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [217] "mountain" "forest" "mountain" "mountain" "mountain" "forest"
#> [223] "mountain" "mountain" "mountain" "beach" "mountain" "mountain"
#> [229] "mountain" "beach" "mountain" "forest" "mountain" "mountain"
#> [235] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [241] "beach" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [247] "mountain" "mountain" "mountain" "mountain" "mountain" "beach"
#> [253] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [259] "mountain" "mountain" "mountain" "mountain" "beach" "mountain"
Evaluating the model using the confusion matrix. This model perform better than the previous model because we put more CNN layer to extract more features from the image.
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction beach forest mountain
#> beach 80 0 12
#> forest 0 77 6
#> mountain 5 3 81
#>
#> Overall Statistics
#>
#> Accuracy : 0.9015
#> 95% CI : (0.859, 0.9347)
#> No Information Rate : 0.375
#> P-Value [Acc > NIR] : < 0.00000000000000022
#>
#> Kappa : 0.8521
#>
#> Mcnemar's Test P-Value : NA
#>
#> Statistics by Class:
#>
#> Class: beach Class: forest Class: mountain
#> Sensitivity 0.9412 0.9625 0.8182
#> Specificity 0.9330 0.9674 0.9515
#> Pos Pred Value 0.8696 0.9277 0.9101
#> Neg Pred Value 0.9709 0.9834 0.8971
#> Prevalence 0.3220 0.3030 0.3750
#> Detection Rate 0.3030 0.2917 0.3068
#> Detection Prevalence 0.3485 0.3144 0.3371
#> Balanced Accuracy 0.9371 0.9649 0.8848
The model_tuned confusion matrix result shows that the model can classify beach, forest, mountain with the accuracy 90.15% on the validation data.
In order the tuning model can be evaluated using the confusion matrix, the contents of the folder for the test data will be divided into three class folders (beach, forest, mountain) and the images are categorized according to their respective folders. Then, extracting the categorical label as the actual value of the target variable.
# Validation Dataset
test_image_array_gen_tuned <- flow_images_from_directory(directory = "data/test/",
target_size = target_size_tuned,
color_mode = "rgb",
batch_size = batch_size_tuned ,
seed = 173,
generator = train_data_gen_tuned
)
# Extract
val_test_data <- data.frame(file_name = paste0("data/test/", test_image_array_gen_tuned$filenames)) %>%
mutate(class = str_extract(file_name, "beach|forest|mountain"))
val_test_dataObtaining the image by converting it into an array, due to the image has 2 dimensions and 3 color channels (RGB). Then, using the model to predict the original image from the folder but doesn’t use the image generator since it will transform the image and does not reflect the actual image.
# Function to convert image to array
image_prep_tuned <- function(x) {
arrays <- lapply(x, function(path) {
img <- image_load(path, target_size = target_size_tuned,
grayscale = F # Set FALSE if image is RGB
)
x <- image_to_array(img)
x <- array_reshape(x, c(1, dim(x)))
x <- x/255 # rescale image pixel
})
do.call(abind::abind, c(arrays, list(along = 1)))
}#> [1] 294 128 128 3
Furthermore, evaluating the data and acquire the confusion matrix for the validation data.
pred1_test <- predict(model_tuned, test1_x) %>%
k_argmax() %>% # xuntuk mengambil nilai probability paling besar
as.array() %>%
as.factor()
head(pred1_test, 10)#> [1] 0 2 0 0 0 0 0 0 0 0
#> Levels: 0 1 2
Converting the encoding into class label for getting easier interpretation.
# Convert encoding to label
decode <- function(x){
case_when(x == 0 ~ "beach",
x == 1 ~ "forest",
x == 2 ~ "mountain"
)
}
pred1_test <- sapply(pred1_test, decode)
pred1_test#> [1] "beach" "mountain" "beach" "beach" "beach" "beach"
#> [7] "beach" "beach" "beach" "beach" "beach" "beach"
#> [13] "mountain" "beach" "beach" "beach" "beach" "beach"
#> [19] "beach" "mountain" "beach" "beach" "beach" "mountain"
#> [25] "beach" "beach" "beach" "beach" "beach" "mountain"
#> [31] "beach" "beach" "beach" "beach" "beach" "mountain"
#> [37] "forest" "beach" "beach" "beach" "beach" "beach"
#> [43] "beach" "beach" "beach" "beach" "beach" "beach"
#> [49] "beach" "beach" "beach" "beach" "forest" "beach"
#> [55] "beach" "beach" "beach" "beach" "beach" "beach"
#> [61] "beach" "beach" "beach" "beach" "beach" "beach"
#> [67] "beach" "beach" "beach" "beach" "beach" "beach"
#> [73] "beach" "beach" "beach" "beach" "beach" "beach"
#> [79] "beach" "beach" "beach" "beach" "mountain" "beach"
#> [85] "beach" "beach" "beach" "beach" "beach" "beach"
#> [91] "beach" "beach" "beach" "beach" "beach" "beach"
#> [97] "beach" "beach" "beach" "beach" "beach" "beach"
#> [103] "mountain" "forest" "beach" "beach" "beach" "beach"
#> [109] "beach" "mountain" "forest" "forest" "forest" "forest"
#> [115] "forest" "forest" "forest" "forest" "forest" "mountain"
#> [121] "forest" "forest" "forest" "forest" "forest" "forest"
#> [127] "forest" "mountain" "forest" "forest" "mountain" "forest"
#> [133] "forest" "forest" "forest" "forest" "mountain" "mountain"
#> [139] "forest" "forest" "forest" "forest" "forest" "forest"
#> [145] "forest" "forest" "forest" "forest" "forest" "forest"
#> [151] "forest" "forest" "forest" "forest" "forest" "mountain"
#> [157] "forest" "forest" "mountain" "forest" "forest" "forest"
#> [163] "forest" "forest" "forest" "forest" "forest" "forest"
#> [169] "forest" "forest" "forest" "forest" "forest" "forest"
#> [175] "mountain" "forest" "forest" "forest" "forest" "forest"
#> [181] "forest" "forest" "forest" "forest" "forest" "forest"
#> [187] "mountain" "forest" "forest" "forest" "forest" "forest"
#> [193] "forest" "forest" "beach" "mountain" "mountain" "mountain"
#> [199] "mountain" "mountain" "mountain" "beach" "mountain" "mountain"
#> [205] "mountain" "mountain" "beach" "mountain" "beach" "mountain"
#> [211] "mountain" "mountain" "mountain" "mountain" "beach" "mountain"
#> [217] "mountain" "mountain" "beach" "mountain" "mountain" "mountain"
#> [223] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [229] "mountain" "beach" "mountain" "mountain" "mountain" "mountain"
#> [235] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [241] "mountain" "beach" "mountain" "mountain" "mountain" "mountain"
#> [247] "beach" "mountain" "mountain" "mountain" "mountain" "beach"
#> [253] "mountain" "mountain" "mountain" "beach" "mountain" "beach"
#> [259] "mountain" "mountain" "mountain" "mountain" "mountain" "beach"
#> [265] "beach" "mountain" "mountain" "mountain" "mountain" "mountain"
#> [271] "mountain" "beach" "mountain" "mountain" "beach" "mountain"
#> [277] "forest" "mountain" "mountain" "beach" "mountain" "mountain"
#> [283] "mountain" "mountain" "forest" "mountain" "mountain" "mountain"
#> [289] "mountain" "mountain" "mountain" "mountain" "mountain" "mountain"
Evaluating the model using the confusion matrix. This model perform better than the previous model because we put more CNN layer to extract more features from the image.
#> Confusion Matrix and Statistics
#>
#> Reference
#> Prediction beach forest mountain
#> beach 98 0 17
#> forest 3 75 2
#> mountain 9 9 81
#>
#> Overall Statistics
#>
#> Accuracy : 0.8639
#> 95% CI : (0.8194, 0.901)
#> No Information Rate : 0.3741
#> P-Value [Acc > NIR] : < 0.0000000000000002
#>
#> Kappa : 0.7943
#>
#> Mcnemar's Test P-Value : 0.01929
#>
#> Statistics by Class:
#>
#> Class: beach Class: forest Class: mountain
#> Sensitivity 0.8909 0.8929 0.8100
#> Specificity 0.9076 0.9762 0.9072
#> Pos Pred Value 0.8522 0.9375 0.8182
#> Neg Pred Value 0.9330 0.9579 0.9026
#> Prevalence 0.3741 0.2857 0.3401
#> Detection Rate 0.3333 0.2551 0.2755
#> Detection Prevalence 0.3912 0.2721 0.3367
#> Balanced Accuracy 0.8993 0.9345 0.8586
The model_tuned confusion matrix result shows that the model can classify beach, forest, mountain with the accuracy 86.39% on the test data set. In this case, the most important metrix is accuracy, because it represent the ability of model can classify for each classes. Furthermore, the model can be improved by modified the architecture (number of neurons, number of layers, activation function type), or modified the hyperparameters (number of ephocs, batch size, optimizer type, learning rate).
In this step, the images aren’t classified according to their respective folders and the test folder doesn’t divided intro three class folders.
test <- read.csv("data/image-data-test.csv")
test_data <- data.frame(file_name = paste0("data/test1/",
test$id))
head(test_data, 10)#> [1] 294 128 128 3
#> [1] 0 1 2 0 2 2 2 1 1 0
#> [1] "beach" "forest" "mountain" "beach" "mountain" "mountain"
#> [7] "mountain" "forest" "forest" "beach"
Building a deep learning model that can find the features and characteristic from the images (beach, forest, mountain). The model is Convolutional Neural Network (CNN) and performs very well due to the model has been tuned by stacking more CNN layers (modified the architectures) to extract more information from the image and modified the hyperparameters (number of ephocs, batch size, optimizer type, learning rate). Then, the goal is achieved where the tuned_model can classify the three classes (beach, forest, mountain) with high accuracy (86.39%) also small differences between test data accuracy and train data accuracy (it indicates the model is not underfit or overfit). Lastly, the potential business implementations are breast cancer image classification, self-driving cars, product defect detection, animal species detection, car plate / ship code detection, etc.